Introduction

B-cell acute lymphoblastic leukemia (B-ALL) is a highly heterogeneous disease defined by distinct gene expression profiles (GEPs) and various genetic lesions. However, without a well-curated GEP reference and publicly accessible analysis platform, the application of the advanced B-ALL classification system is still challenging, especially for the samples with low blast percentage (blast%). With the largest B-ALL RNA-seq dataset consistently analyzed in this research, multiple layers of signature are systematically delineated for each B-ALL subtype to facilitate the application of the granular subtyping system.

Methods

We analyzed 2,933 RNA-seq of B-ALL cases from public datasets to identify chromosomal rearrangements, GEPs, somatic mutations, structure variations, and large-scale copy-number variations to define B-ALL subtypes. CIBERSORTx was used for the digital deconvolution of cell types based on the bulk GEP. Single-cell (sc) RNA-seq, scBCR-seq, and targeted sc-genotyping were performed using the 10X Genomics platform to identify the leukemic clones and then GEP based classification. The V(D)J recombination of BCR was assessed by TRUST4 in both bulk and scRNA-seq data. SingleR package was used for the single-cell B-ALL subtyping.

Results

Samples with potential low blast percentage (B cell ratio estimated by CIBERSORTx), low sequencing depth/coverage, outliers based on GEP (such as T-cell ALL), or considered as related individuals were removed. In total 2,839 high-quality RNA-seq from 27 B-ALL subtypes were kept and thoroughly analyzed. Among the 27 subtypes, 19 defined by featured genetic lesions and distinct GEPs were straightforward to be identified through RNA-seq. These subtypes can be categorized into 4 major classes: 1. gene rearrangements, which include ETV6-RUNX1, TCF3-PBX1, BCR-ABL1 (Ph), KMT2A-rearranged (KMT2Ar), DUX4r, MEF2Dr, ZNF384r, IGH-BCL2/IGH-MYC/IGH-BCL6 (BCL2/MYC), NUTM1r, and HLFr; 2. gross chromosomal alterations, which include high hyperdiploid, low hypodiploid, and iAMP21; and 3. sequence mutations, which include PAX5 P80R and IKZF1 N159Y; and 4. others, which include the ones carrying multiple signature genetic lesions, such as PAX5alt, defined by various types of PAX5 alterations, and ZEB2/CEBPE, defined by IGH-CEBPE and ZEB2 H1038R mutation. Besides the ones with distinct GEPs, 5 subtypes are defined by the genetic lesions or phenocopy of the ones with distinct GEPs, which include near haploid (by karyotype), Ph-like, ETV6-RUNX1-like, ZNF384-like, and KMT2A-like. Subgroups within canonical B-ALL subtypes were also identified, such as the two distinct clusters in the Ph subtype, with the minor cluster showing more pre-pro B-cell signature compared to the other one. With the single-cell gene expression reference developed from the 1 Million Immune Cell project (https://data.humancellatlas.org), we deconvoluted the bulk GEP of each B-ALL subtype. PAX5 P80R, KMT2A, and Ph were observed with high enrichment of proB cells, while BCL2/MYC, TCF2-PBX1 and MEF2D subtypes were more enriched with late preB or even mature B-cells. BCR analysis showed that the V(D)J recombination patterns were consist with the deconvoluted B-cell stages, such as the clonal light-chain of BCR are only observed in B-ALL subtypes with the feature of pre-B or more mature B-cells.

To develop GEP-based B-ALL subtyping algorithm, a reference dataset including 2,071 cases from 19 subtypes were identified (Fig 1). With this GEP reference, the B-ALL subtypes can be defined by various kinds of clustering algorithms. A graphic user interface was developed for automatic B-ALL classification. With the bulk GEP reference, single-cell GEP from a PAX5alt sample was tested for B-ALL classification. Different cell types were dissected using the single-cell reference described above and the B-ALL subtype was successfully identified in the leukemia cell cluster identified by clonal V(D)J recombination (Fig 2), which is a proof of concept that scRNA-seq can identify homogenous leukemic cells and have reliable GEP-based B-ALL classification, even for samples with low blast%.

Conclusions Based on the largest B-ALL RNA-seq dataset, we present multi-omics signatures of B-ALL subtypes and developed a standard GEP reference of B-ALL classification. This reference dataset is highly reliable for both bulk and single-cell prediction of B-ALL subtypes.

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution